Alignment-free sequence comparison with spaced k-mers

نویسندگان

  • Marcus Boden
  • Martin Schöneich
  • Sebastian Horwege
  • Sebastian Lindner
  • Chris-Andre Leimeister
  • Burkhard Morgenstern
چکیده

Alignment-free methods are increasingly used for genome analysis and phylogeny reconstruction since they circumvent various difficulties of traditional approaches that rely on multiple sequence alignments. In particular, they are much faster than alignment-based methods. Most alignmentfree approaches work by analyzing the k-mer composition of sequences. In this paper, we propose to use ‘spaced k-mers’, i.e. patterns of deterministic and ‘don’t care’ positions instead of contiguous k-mers. Using simulated and real-world sequence data, we demonstrate that this approach produces better phylogenetic trees than alignment-free methods that rely on contiguous k-mers. In addition, distances calculated with spaced k-mers appear to be statistically more stable than distances based on contiguous k-mers. 1998 ACM Subject Classification J.3 Life and Medical Sciences

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

gpALIGNER: A Fast Algorithm for Global Pairwise Alignment of DNA Sequences

Bioinformatics, through the sequencing of the full genomes for many species, is increasingly relying on efficient global alignment tools exhibiting both high sensitivity and specificity. Many computational algorithms have been applied for solving the sequence alignment problem. Dynamic programming, statistical methods, approximation and heuristic algorithms are the most common methods appli...

متن کامل

Fast alignment-free sequence comparison using spaced-word frequencies

MOTIVATION Alignment-free methods for sequence comparison are increasingly used for genome analysis and phylogeny reconstruction; they circumvent various difficulties of traditional alignment-based approaches. In particular, alignment-free methods are much faster than pairwise or multiple alignments. They are, however, less accurate than methods based on sequence alignment. Most alignment-free ...

متن کامل

Spaced words and kmacs: fast alignment-free sequence comparison based on inexact word matches

In this article, we present a user-friendly web interface for two alignment-free sequence-comparison methods that we recently developed. Most alignment-free methods rely on exact word matches to estimate pairwise similarities or distances between the input sequences. By contrast, our new algorithms are based on inexact word matches. The first of these approaches uses the relative frequencies of...

متن کامل

A Novel Pseudo-Alignment Approach to Fast Genomic Sequence Comparison

Standard methods for sequence analysis and phylogeny reconstruction are based on (multiple) sequence alignments. These methods are known to be accurate but if larger genomic sequences are to be analysed they reach their limits. Consequently, faster but less precise alignment-free methods are increasingly used for genomic sequence analysis. In this work, a novel approach to fast genomic sequence...

متن کامل

rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison

Many algorithms for sequence analysis rely on word matching or word statistics. Often, these approaches can be improved if binary patterns representing match and don't-care positions are used as a filter, such that only those positions of words are considered that correspond to the match positions of the patterns. The performance of these approaches, however, depends on the underlying patterns....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013